IMDB Movies Visualization

Author: Chisheng Li

Retrieve the updated IMDB datasets countries.list.gz, genres.list.gz and ratings.list.gz at ftp://ftp.fu-berlin.de/pub/misc/movies/database/.

1) Create a world map to display the average movie rating by country.

Open countries.list and ratings.list, merge the files with the same movie names along with their IMDB rating score and country of origin and output to countryRating.txt.


In [1]:
ratingsFile = open('ratings.list','r')
countriesFile = open('countries.list','r')
output = open('countryRating.txt','w')

In [2]:
# Start readline() at the appropriate line
while True:
    if countriesFile.readline() == "COUNTRIES LIST\n":
        break;
countriesFile.readline()

while True:
    if ratingsFile.readline() == "MOVIE RATINGS REPORT\n":
        break;
ratingsFile.readline()
ratingsFile.readline()


Out[2]:
'New  Distribution  Votes  Rank  Title\n'

Extract the first movie record from ratings file


In [3]:
ratingLine = ratingsFile.readline().rstrip("\n")
ratingLine = ratingLine.split()
ratingMovieName = " ".join(ratingLine[3:len(ratingLine)])
rating = ratingLine[2]

Extract the first movie record from countries file


In [4]:
countriesLine = countriesFile.readline().rstrip("\n")
i = countriesLine.rfind(")")
countries = countriesLine[i+1:len(countriesLine)]
countries = countries.strip("\t")
while countriesLine[i]!="\t":
    i-=1
countriesMovieName = countriesLine[0:i+1]
countriesMovieName = countriesMovieName.replace("	","")
countriesMovieName = countriesMovieName.rstrip(" ")

In [5]:
while True:
    # Iterating over records from file
    if ratingLine == "" or countriesLine[0:3] == "---":
        break
    
    # Skipping episodes from serials
    if ratingMovieName.find("{") !=-1:
        ratingLine = ratingsFile.readline().rstrip("\n")
        if ratingLine == "":
            break
        ratingLine = ratingLine.split()
        ratingMovieName = " ".join(ratingLine[3:len(ratingLine)])
        rating = ratingLine[2]
        continue
    
    if countriesLine.find("{")!=-1 or countriesLine.find(")")==-1:
        countriesLine = countriesFile.readline().rstrip("\n")
        if countriesLine[0:3] == "---":
            break
        i = countriesLine.rfind(")")
        countries = countriesLine[i+1:len(countriesLine)]
        countries = countries.strip("\t")
        while countriesLine[i]!="\t":
            i-=1
        countriesMovieName = countriesLine[0:i+1]
        countriesMovieName = countriesMovieName.replace("	","")
        countriesMovieName = countriesMovieName.rstrip(" ")
        continue
    
    # Adding matched record from ratings and countries file
    if ratingMovieName == countriesMovieName:
        output.write(ratingMovieName)
        output.write("\t")
        output.write(rating)
        output.write("\t")
        output.write(countries)
        output.write("\n")
        ratingLine = ratingsFile.readline().rstrip("\n")
        if ratingLine == "":
            break
        ratingLine = ratingLine.split()
        ratingMovieName = " ".join(ratingLine[3:len(ratingLine)])
        rating = ratingLine[2]
        
        countriesLine = countriesFile.readline().rstrip("\n")
        if countriesLine[0:3] == "---":
            break
        if countriesLine.find(")")==-1:
            continue
        i = countriesLine.rfind(")")
        countries = countriesLine[i+1:len(countriesLine)]
        countries = countries.strip("\t")
        while countriesLine[i] != "\t":
            i-=1
        countriesMovieName = countriesLine[0:i+1]
        countriesMovieName = countriesMovieName.replace("	","")
        countriesMovieName = countriesMovieName.rstrip(" ")
        continue
    
    if ratingLine == "" or countriesLine[0:3] == "---":
        break
    
    while ratingMovieName < countriesMovieName:
        ratingLine = ratingsFile.readline().rstrip("\n")
        if ratingLine == "":
            break
        if ratingLine.find("{")!=-1:
            continue
        ratingLine = ratingLine.split()
        ratingMovieName = " ".join(ratingLine[3:len(ratingLine)])
        rating = ratingLine[2]
    
    while countriesMovieName < ratingMovieName:
        countriesLine = countriesFile.readline().rstrip("\n")
        if countriesLine[0:3] == "---":
            break
        if countriesLine.find(")") == -1 or countriesLine.find("{")!=-1:
            continue
        i = countriesLine.rfind(")")
        countries = countriesLine[i+1:len(countriesLine)]
        countries = countries.strip("\t")
        while countriesLine[i]!="\t":
            i-=1
        countriesMovieName = countriesLine[0:i+1]
        countriesMovieName = countriesMovieName.replace("	","")
        countriesMovieName = countriesMovieName.rstrip(" ")

Close the files


In [6]:
ratingsFile.close()
countriesFile.close()
output.close()

Open countryRating.txt and calculate the average IMDB movie rating for each country


In [7]:
import csv
from collections import defaultdict, namedtuple
from operator import attrgetter, itemgetter
from itertools import imap

In [8]:
CountryRating = namedtuple('CountryRating', 'countryorigin ratingscore')

fieldnames = 'name', 'score', 'country'
score_and_country = itemgetter('score', 'country')
ratings = defaultdict(list)

In [9]:
with open("countryRating.txt", "r") as moviefile:
    movies = csv.DictReader(moviefile, fieldnames=fieldnames, delimiter='\t')
    for score, country in imap(score_and_country, movies):
        # Relabel some countries to their present day counterparts
        if country == 'West Germany':
            country = 'Germany'
        if country == 'East Germany':
            country = 'Germany'
        if country == 'North Vietnam':
            country = 'Vietnam'
        if country == 'Korea':
            country = 'South Korea'
        if country == 'Palestine':
            country = 'Occupied Palestinian Territory'
        if country == 'Soviet Union':
            country = 'Russia'
        if country == 'Dominica':
            country = 'Dominican Republic'
        if country == 'Yugoslavia':
            country = 'Federal Republic of Yugoslavia'
        ratings[country].append(float(score))

In [10]:
average = lambda alist: sum(alist) / len(alist)
average_ratings = [CountryRating(country, average(scores)) for country, scores in ratings.iteritems()]

print "\nCountries with the highest average movie rating"
print "-----------------------------------------------"
sorted_ratings = sorted(average_ratings, key=attrgetter('ratingscore'), reverse=True)
for i, j in enumerate(sorted_ratings):
    print '%i. %s \t%g' % (i + 1, j.countryorigin, j.ratingscore)


Countries with the highest average movie rating
-----------------------------------------------
1. Tonga 	8.2
2. Croatia 	8.06774
3. Libya 	8.06667
4. Gambia 	7.8
5. Moldova 	7.69333
6. Swaziland 	7.5
7. Macao 	7.5
8. Burundi 	7.5
9. Oman 	7.46667
10. Federal Republic of Yugoslavia 	7.39203
11. Tanzania 	7.35385
12. Mongolia 	7.35333
13. United Arab Emirates 	7.31702
14. Sierra Leone 	7.3
15. Mauritania 	7.3
16. Marshall Islands 	7.3
17. Botswana 	7.3
18. Uganda 	7.275
19. Monaco 	7.23077
20. Nepal 	7.2026
21. Turkmenistan 	7.2
22. Yemen 	7.2
23. Kosovo 	7.18421
24. Afghanistan 	7.15833
25. Jordan 	7.14839
26. Azerbaijan 	7.14655
27. Namibia 	7.14286
28. Georgia 	7.13077
29. Syria 	7.1283
30. Romania 	7.11671
31. Lithuania 	7.10692
32. Sri Lanka 	7.10088
33. San Marino 	7.1
34. Qatar 	7.09048
35. Kenya 	7.08276
36. Bulgaria 	7.08055
37. Honduras 	7.06667
38. Fiji 	7.06667
39. Somalia 	7.05
40. Iraq 	7.0325
41. Myanmar 	7.02
42. El Salvador 	7.01429
43. Bangladesh 	7.00597
44. Vanuatu 	7
45. New Caledonia 	7
46. Republic of Macedonia 	6.94793
47. Barbados 	6.92857
48. Senegal 	6.91026
49. Rwanda 	6.86667
50. Occupied Palestinian Territory 	6.85484
51. Iceland 	6.85346
52. Kyrgyzstan 	6.85333
53. Costa Rica 	6.83158
54. Bosnia and Herzegovina 	6.8306
55. Kuwait 	6.82353
56. Serbia 	6.82189
57. Lebanon 	6.81826
58. Albania 	6.81301
59. Serbia and Montenegro 	6.8009
60. Guam 	6.8
61. Japan 	6.79986
62. Jamaica 	6.79643
63. Panama 	6.79474
64. UK 	6.7854
65. Ethiopia 	6.78235
66. New Zealand 	6.77252
67. Maldives 	6.76
68. Sudan 	6.75714
69. Estonia 	6.75358
70. Ghana 	6.75172
71. Ireland 	6.74568
72. Venezuela 	6.73376
73. Latvia 	6.73333
74. Cuba 	6.73033
75. Uruguay 	6.72893
76. Hungary 	6.72366
77. Montenegro 	6.71333
78. Cyprus 	6.70141
79. Burkina Faso 	6.7
80. Bhutan 	6.7
81. Zimbabwe 	6.69333
82. Vietnam 	6.69147
83. Australia 	6.68541
84. Netherlands 	6.67424
85. Paraguay 	6.65789
86. Cameroon 	6.65556
87. Israel 	6.6504
88. Trinidad and Tobago 	6.648
89. Benin 	6.64
90. Greenland 	6.63333
91. Slovenia 	6.63238
92. Guinea 	6.62
93. Tunisia 	6.60192
94. Niger 	6.575
95. Angola 	6.56364
96. Cambodia 	6.56
97. U.S. Virgin Islands 	6.56
98. Tajikistan 	6.55
99. Guinea-Bissau 	6.55
100. Colombia 	6.54296
101. Algeria 	6.51449
102. Switzerland 	6.51445
103. Siam 	6.5
104. Belize 	6.5
105. Guatemala 	6.48182
106. Ivory Coast 	6.46875
107. Haiti 	6.46667
108. Faroe Islands 	6.46667
109. Canada 	6.46247
110. Ukraine 	6.46186
111. Suriname 	6.45
112. Nigeria 	6.44436
113. Belgium 	6.4405
114. Kazakhstan 	6.43733
115. USA 	6.43377
116. Netherlands Antilles 	6.43333
117. Morocco 	6.43077
118. Peru 	6.4205
119. Singapore 	6.41624
120. Guyana 	6.4
121. French Polynesia 	6.4
122. Democratic Republic of the Congo 	6.4
123. Russia 	6.39687
124. Saudi Arabia 	6.3963
125. Gabon 	6.3875
126. Czechoslovakia 	6.37453
127. South Korea 	6.34578
128. Bahrain 	6.34167
129. Germany 	6.33811
130. Austria 	6.32377
131. Malta 	6.3
132. Guadeloupe 	6.3
133. Ecuador 	6.29118
134. Isle of Man 	6.26667
135. Mali 	6.2625
136. Chile 	6.26112
137. Mozambique 	6.25
138. Portugal 	6.24909
139. India 	6.24071
140. France 	6.2388
141. Papua New Guinea 	6.23333
142. Puerto Rico 	6.22886
143. Slovakia 	6.22603
144. Argentina 	6.22107
145. Poland 	6.20673
146. Zaire 	6.2
147. South Africa 	6.1699
148. Sweden 	6.16565
149. Norway 	6.16503
150. Laos 	6.14
151. Taiwan 	6.1373
152. Philippines 	6.13619
153. Uzbekistan 	6.13611
154. Dominican Republic 	6.13514
155. Timor-Leste 	6.13333
156. Luxembourg 	6.1188
157. China 	6.10793
158. Czech Republic 	6.10112
159. Nicaragua 	6.1
160. Togo 	6.1
161. Burma 	6.1
162. Liechtenstein 	6.09167
163. Mauritius 	6.08333
164. Iran 	6.07738
165. Finland 	6.06601
166. Hong Kong 	6.06176
167. Denmark 	6.05739
168. Egypt 	6.03583
169. Brazil 	6.03214
170. Thailand 	6.01719
171. Mexico 	6.01637
172. Niue 	6
173. Italy 	5.99527
174. Spain 	5.97641
175. Brunei Darussalam 	5.975
176. Antigua and Barbuda 	5.93333
177. Greece 	5.91729
178. Chad 	5.91667
179. Turkey 	5.91163
180. Belarus 	5.91111
181. North Korea 	5.90741
182. Madagascar 	5.9
183. Bolivia 	5.8
184. Congo 	5.8
185. Malaysia 	5.75088
186. Indonesia 	5.60554
187. Bahamas 	5.53333
188. Zambia 	5.5
189. Pakistan 	5.46235
190. Cape Verde 	5.35
191. Gibraltar 	5.26667
192. Aruba 	5.23333
193. Saint Lucia 	4.9
194. Malawi 	4.9
195. Armenia 	4.77431
196. Tuvalu 	4.7
197.  	4.65
198. Eritrea 	4.6
199. Andorra 	4.6
200. Cayman Islands 	4.5

World map visualization


In [11]:
from IPython.display import Image
Image(filename='world map.png')


Out[11]:

2) Create a wordle from the titles of all movies in a genre.

Open genres.list and output every Comedy movie title to comedyNames.txt.


In [12]:
# Create a wordle from the titles of all movies in a genre
# 1. start line at "!Next?" (1994)	Documentary
# 2. discard the titles with {}
# 3. select movie titles only from 1 genre (eg. Comedy)
# 4. remove "" from the movie titles
genresFile = open('genres.list','r')
output = open('comedyNames.txt','w') #or try comedyNames.list

# To start readline() at the right line
while True:
    if genresFile.readline() == "8: THE GENRES LIST\n":
        break;
genresFile.readline()
genresFile.readline()


Out[12]:
'\n'

In [13]:
# Remove all the second brackets eg. (TV), (V), (VG)
# Remove all the " " and punctuations from the movie titles
for line in genresFile.readlines():
    genresLine = line.replace("(VG)"," ").replace("(TV)"," ").replace("(V)"," ").rstrip("\n").split()
    
    #position of the genres = len(genresLine)-1
    genres = genresLine[len(genresLine)-1]
    #print genres    # this prints out the genres of every movie
    
    # Only print the name of movies that are under the genre 'Comedy'
    if genres.find("Comedy") != -1:
        
        # Skip the movies with "{}", ie. detect for "}" at position len(genresLine)-2
        genresYear = genresLine[len(genresLine)-2]
        # This skips the movies with "{}"
        if genresYear.find("}") == -1:
            genresMovieName = " ".join(genresLine[0:len(genresLine)-2])
            # Now remove the quotation marks " " around the movie names 
            if genresMovieName.startswith('"') and genresMovieName.endswith('"'):
                genresMovieName = genresMovieName[1:-1]
            
            output.write(genresMovieName)
            output.write("\n")
            #print genresMovieName

    if genresLine=="":
        break

Closing files


In [14]:
genresFile.close()
output.close()

Read comedyNames.txt, filter a selected list of stopwords from the movie names, then output the filtered list into a new text file


In [15]:
import re

# Set a list of stopwords to be removed from the movie titles
stopwords = set(('A', 'al', 'Al', 'auf', 'Auf', 'da', 'Da', 'Dans', 'das', 'Das', 'de', 'De', 
                 'del', 'Del', 'der', 'Der', 'des', 'Des', 'di', 'Die', 'du', 'ein', 'Ein',
                  'el', 'El', 'en', 'En', 'es', 'et', 'Et', 'Ich', 'il', 'Il', 'ja', 'la', 
                  'La', 'las', 'Las', 'le', 'Le', 'les', 'Les', 'lo', 'Lo', 'los', 'Los', 
                  'mi', 'Mi', 'na', 'ni', 'Por', 'que', 'Que', 'se', 'Se', 'Um', 'un', 'Un', 
                  'una', 'Una', 'und', 'une', 'Une'))

comedyNames = open('comedyNames.txt')

In [16]:
OUT = open('comedyFiltered.txt', 'w')
for line in comedyNames.readlines():
    movieNames = line.rstrip("\n").split()
    if any(c in movieNames for c in stopwords):
        filteredNames = " ".join([i for i in movieNames if i not in stopwords])
        OUT.write(filteredNames)
        OUT.write("\n")
        #print filteredNames
    else:
        filteredNames = " ".join(movieNames)
        OUT.write(filteredNames)
        OUT.write("\n")
        #print filteredNames
OUT.close()

Wordle visualization


In [19]:
Image(filename='Comedy title wordle.png')


Out[19]:

3) Create a line chart of the number of movies made in each genre over time.

Create a line plot (the new IBM Many Eyes doesn't have stackgrpah visualization anymore) of the number of movies made in each genre (for an individual country or all countries combined) over time.

  1. start line at "!Next?" (1994) Documentary
  2. discard the titles with {}
  3. discard the movies where the Year is (????)
  4. remove (), remove /I, /II, /III, /IV etc from the movie year

In [20]:
genresFile = open('genres.list','r')
output = open('genreYear.txt','w')

#To start readline() at the right line
while True:
    if genresFile.readline() == "8: THE GENRES LIST\n":
        break;
genresFile.readline()
genresFile.readline()


Out[20]:
'\n'

In [21]:
#Remove all the second brackets eg. (TV), (V), (VG)
for line in genresFile.readlines():
    genresLine = line.replace("(VG)"," ").replace("(TV)"," ").replace("(V)"," ").rstrip("\n").split()
    
    #position of the genres = len(genresLine)-1
    genres = genresLine[len(genresLine)-1]
    
    #skip the 1 movie where the genre is _//bbfc.co.uk/releases/import-export-2008-0_
    if genres.find("_") != -1:
        continue
    
    genresYear = genresLine[len(genresLine)-2]
    #remove the parenthesis () around the movie years
    if genresYear.startswith('(') and genresYear.endswith(')'):
        genresYear = genresYear[1:-1]
    
    #skip the movies where the year is ????
    if genresYear.find("?") != -1:
        continue
    
    #Cleanup steps, to remove the /I, /II, /IV, /V, /IX, /X etc from the movie years
    #remove the /V from the movie years
    if genresYear.endswith('V'):
        genresYear = genresYear[0:-2]
    #remove the /I and /II from the movie years
    if genresYear.endswith('I'):
        genresYear = genresYear[0:-2]
    #remove the remaining /I from the movie years
    if genresYear.endswith('I'):
        genresYear = genresYear[0:-2]
    #remove the remaining /V from the movie years
    if genresYear.endswith('V'):
        genresYear = genresYear[0:-2]
    #remove the /X from the movie years
    if genresYear.endswith('X'):
        genresYear = genresYear[0:-2]
    #remove the /X from the movie years
    if genresYear.endswith('X'):
        genresYear = genresYear[0:-2]
    #remove the remaining /X from the movie years
    if genresYear.endswith('X'):
        genresYear = genresYear[0:-2]
    #remove the r/XL from the movie years
    if genresYear.endswith('L'):
        genresYear = genresYear[0:-2]
    #remove all the remaining / from the movie years
    if genresYear.endswith('/'):
        genresYear = genresYear[0:-1]
    
    # skip the movies with {}
    if genresYear.find("}") == -1:
        output.write(genres)
        output.write("\t")
        output.write(genresYear)
        output.write("\n")
        #print genresYear + "\t" + genres
    
    if genresLine=="":
        break

Closing files


In [22]:
genresFile.close()
output.close()

Read genreYear.txt and sort/count the genres and the years Print out the following format:

Year    1880    1887    1888    1889    1890    1891
Action  0   0   0   0   0   1

note: genreYear.txt has a blank line in the last line, delete that line before running this code


In [23]:
from collections import Counter

with open('genreYear.txt') as sortGY:
    lines = sortGY.read().split('\n')

# Replace separating whitespace with exactly one space
lines = [' '.join(l.split()) for l in lines]

# Sort genres and years
genres = sorted(set(l.split()[0] for l in lines))
years = sorted(set(l.split()[1] for l in lines))

# Count the sorted genres and years
countGY = Counter(lines)

In [24]:
OUT = open("sortedYear.txt", "w")
OUT.write("Year" + "\t",)
print "Year" + '\t',
for y in years:
    OUT.write(y + '\t',)
    print y + '\t',
print
OUT.write('\n')
for g in genres:
    OUT.write(g + '\t',)
    print g + '\t',
    for y in years:
        print `countGY[g + ' ' + y]` + '\t',
        OUT.write(`countGY[g + ' ' + y]` + '\t',)
    OUT.write('\n')
    print
OUT.close()


Year	1874	1878	1887	1888	1889	1890	1891	1892	1893	1894	1895	1896	1897	1898	1899	1900	1901	1902	1903	1904	1905	1906	1907	1908	1909	1910	1911	1912	1913	1914	1915	1916	1917	1918	1919	1920	1921	1922	1923	1924	1925	1926	1927	1928	1929	1930	1931	1932	1933	1934	1935	1936	1937	1938	1939	1940	1941	1942	1943	1944	1945	1946	1947	1948	1949	1950	1951	1952	1953	1954	1955	1956	1957	1958	1959	1960	1961	1962	1963	1964	1965	1966	1967	1968	1969	1970	1971	1972	1973	1974	1975	1976	1977	1978	1979	1980	1981	1982	1983	1984	1985	1986	1987	1988	1989	1990	1991	1992	1993	1994	1995	1996	1997	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2024	2025	
Action	0	0	0	0	0	0	1	0	0	3	0	1	3	2	0	1	1	1	9	4	3	0	2	20	5	1	7	9	18	22	39	44	23	19	38	30	35	48	37	45	75	123	79	88	58	55	45	66	48	64	116	84	107	108	106	72	101	96	56	57	51	68	91	104	95	80	73	90	95	70	79	77	125	107	106	132	153	160	162	225	218	281	274	259	293	280	301	355	417	391	297	359	412	385	498	489	463	492	533	496	538	569	652	690	726	807	792	779	741	749	815	805	851	844	920	980	1063	1164	1262	1238	1348	1377	1480	1725	2282	3018	3246	3399	3551	3429	2613	631	173	46	13	13	2	0	0	0	
Adult	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	0	1	0	0	0	1	1	0	0	0	0	4	2	0	2	3	7	0	1	1	1	4	0	1	3	3	1	0	0	0	1	2	1	1	0	0	1	0	1	3	0	2	1	1	0	0	3	0	0	0	0	2	1	1	1	6	19	17	13	29	18	117	238	180	201	186	291	309	248	279	263	241	318	250	394	556	755	720	661	678	793	898	919	1352	1595	1568	1592	1239	1312	1517	1748	2135	2300	2570	3370	4206	4573	4379	4318	3844	3054	2726	2384	2416	2177	1653	333	0	0	0	0	0	0	0	0	0	
Adventure	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	0	3	5	4	4	5	9	9	13	22	26	68	75	108	81	55	70	72	71	84	71	61	57	50	86	123	63	69	73	55	45	63	52	83	90	89	107	122	119	102	105	87	74	71	55	67	99	91	109	103	102	129	137	144	139	125	147	144	151	152	197	208	186	220	232	295	303	292	263	284	269	267	213	223	224	199	213	223	236	191	216	230	221	219	251	264	274	269	288	296	288	298	331	359	381	376	430	532	520	586	620	684	707	618	709	755	711	863	1408	1965	2098	2175	2510	2434	1302	303	87	29	6	5	3	0	0	0	
Animation	0	0	0	0	0	0	0	3	0	1	0	0	0	1	2	2	1	1	1	1	1	2	8	22	20	19	8	23	125	60	152	315	282	186	173	288	241	201	171	180	242	278	236	198	185	201	227	251	230	226	223	233	208	229	194	202	211	235	198	196	158	155	166	188	193	173	194	177	192	192	183	178	185	157	182	183	191	257	255	204	262	267	377	247	256	260	283	242	256	242	270	239	264	329	299	312	293	315	317	347	368	373	414	406	380	440	450	438	405	497	509	578	619	664	791	897	878	920	1090	1148	1361	1458	1569	1793	2054	2547	2738	2784	2954	2579	905	115	51	25	1	2	0	0	0	0	
Biography	0	0	0	0	0	0	0	0	0	0	0	1	0	0	5	1	0	2	4	0	2	3	0	2	6	3	5	4	4	7	6	5	12	8	4	2	3	11	10	2	5	7	5	7	5	6	7	4	8	14	14	19	15	19	33	30	15	25	17	17	16	15	15	15	29	21	19	32	38	28	40	25	37	27	29	28	29	26	20	26	34	35	36	24	47	47	43	48	72	58	66	59	68	76	69	90	106	83	91	94	93	102	100	100	98	104	109	107	147	151	185	179	175	164	174	218	201	229	234	300	370	438	449	546	1309	2132	2386	2638	3079	3163	1041	136	31	3	0	1	0	0	0	0	
Comedy	0	0	0	0	0	1	0	1	0	4	9	35	106	188	200	224	158	229	424	338	361	391	729	1086	1319	1127	1678	2468	2961	3006	2655	2116	1578	1120	985	1198	953	877	731	775	920	966	1009	891	826	715	738	663	686	688	683	729	722	720	621	574	617	550	472	398	359	425	455	508	543	557	586	617	624	603	622	643	665	663	698	751	791	799	811	884	891	863	926	908	938	959	958	945	870	942	973	964	926	948	971	1035	1029	1058	1025	1150	1147	1255	1435	1320	1360	1357	1409	1414	1330	1472	1565	1715	1876	2034	2244	2343	2858	3162	3891	4289	4973	5625	6050	6948	9638	11777	13217	14094	15114	15003	7396	729	103	20	3	2	0	1	0	0	
Commercial	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Crime	0	0	0	0	0	0	0	0	0	0	2	2	7	2	1	9	2	1	18	19	29	29	21	50	55	60	60	121	185	201	158	105	96	49	91	97	92	70	79	72	40	69	52	68	77	94	115	132	107	108	131	170	165	168	181	117	96	108	65	72	86	114	148	143	172	195	167	133	148	172	173	147	210	193	201	246	196	177	190	200	176	236	275	263	267	216	258	296	345	303	274	293	274	231	246	230	210	237	261	252	311	320	350	311	376	400	351	376	397	375	405	428	433	462	450	567	622	723	731	697	837	912	1078	1140	1598	1856	2065	2065	2394	2476	1555	290	54	10	2	3	0	0	0	0	
Documentary	1	0	0	3	1	1	3	1	1	28	52	500	764	1136	1148	1235	1317	1171	1641	985	904	921	772	952	973	791	697	926	976	510	536	517	498	450	412	381	375	307	256	233	334	243	326	292	238	221	203	367	359	241	254	466	507	392	313	254	315	321	324	241	297	277	310	323	303	378	369	451	467	406	479	438	525	485	495	566	630	651	798	873	821	819	858	857	844	926	903	861	842	840	918	946	862	860	894	979	845	848	901	921	971	951	944	1009	1032	1165	1254	1369	1318	1558	1713	1908	2112	2481	2791	3184	3763	4261	5284	6165	6510	6731	7240	7780	9171	10313	11133	11730	12022	11771	5129	373	45	4	5	2	0	0	0	0	
Drama	0	0	0	0	0	0	0	0	0	1	5	8	21	31	57	60	40	29	123	84	109	144	305	803	1239	1255	1977	2774	3232	3191	3101	1964	1304	917	783	744	649	646	544	510	478	499	521	572	467	474	526	522	488	539	535	592	659	678	654	535	465	459	436	390	390	569	644	714	803	778	858	846	847	907	940	931	1070	1082	1158	1206	1281	1285	1243	1258	1336	1351	1491	1624	1771	1723	1603	1587	1566	1635	1518	1556	1526	1671	1710	1737	1742	1758	1798	1775	1832	1909	1962	1906	1929	1954	2015	1880	1977	2035	2104	2256	2427	2559	2787	3134	3445	4065	4837	5564	6819	7794	8491	9815	13541	17032	18622	20428	22166	22856	12879	1695	295	50	17	12	2	2	0	0	
Erotica	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Experimental	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Family	0	0	0	0	0	1	0	0	0	0	0	2	13	3	1	0	1	0	9	0	5	2	5	2	2	5	22	30	20	6	5	5	9	6	2	5	8	18	21	24	28	23	23	27	63	76	91	108	90	95	101	121	132	131	139	128	138	142	98	105	85	115	131	143	182	193	209	235	229	236	257	209	230	219	230	226	228	242	201	210	235	261	243	208	240	258	240	309	273	272	329	323	348	373	329	336	395	509	452	442	424	484	439	391	400	434	414	397	426	461	486	534	535	590	639	697	709	800	848	769	977	982	1028	1259	2018	2919	3212	3510	4062	4053	1359	135	31	5	1	0	0	0	0	0	
Fantasy	0	0	0	0	0	0	0	0	0	0	0	2	4	9	20	28	40	62	89	26	15	15	38	57	47	23	19	38	37	26	18	15	28	14	11	9	23	16	18	13	13	10	6	8	4	14	8	13	25	23	30	25	24	19	26	26	23	20	16	25	26	32	32	29	35	31	42	42	52	52	52	40	72	53	62	61	65	65	60	91	81	83	98	92	101	103	102	105	102	126	96	128	109	121	137	114	151	161	179	164	220	230	237	242	237	246	262	276	238	301	305	292	291	321	345	442	474	541	539	612	680	818	912	1100	1705	2407	2679	2778	3074	2772	1237	238	65	26	4	4	1	1	0	0	
Film-Noir	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	6	6	1	3	2	5	10	4	8	6	7	11	5	16	28	42	62	47	54	59	40	26	29	27	35	24	24	12	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Game-Show	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	2	6	4	3	0	1	0	14	8	10	23	19	22	28	31	20	19	26	23	32	18	19	20	16	21	20	26	24	31	9	33	20	28	25	19	22	34	34	35	29	30	28	34	30	42	48	42	43	43	63	51	69	52	65	65	78	72	93	71	90	127	127	171	127	135	161	213	278	271	251	202	176	226	205	192	135	50	2	0	0	0	0	0	0	0	0	
History	0	0	0	0	0	0	0	0	0	0	2	0	3	1	1	3	1	1	0	1	0	3	2	17	19	28	16	16	24	34	31	24	23	16	12	9	11	29	32	20	14	15	14	23	18	13	9	15	16	23	29	37	42	47	49	40	25	41	39	27	31	30	29	22	37	36	30	35	49	49	62	44	42	35	49	42	54	52	53	58	52	53	62	62	80	94	89	85	86	74	79	76	70	108	86	92	105	86	91	98	78	79	95	72	106	83	119	99	125	115	120	141	186	147	186	198	199	213	255	272	361	423	488	545	1182	1728	1866	2031	2320	2318	834	94	27	12	3	1	1	0	0	0	
Horror	0	0	0	0	0	0	0	0	0	0	0	1	3	3	4	2	1	1	0	0	0	2	1	4	2	4	2	7	10	9	13	8	11	6	11	22	12	7	7	9	7	8	9	5	9	5	14	27	20	18	17	23	8	4	28	18	18	18	19	25	20	20	6	14	18	8	17	9	22	27	23	35	66	81	63	66	56	65	74	75	70	77	72	100	95	126	153	176	185	177	141	123	122	116	110	151	167	158	147	123	179	210	252	300	290	275	238	227	228	216	265	287	297	317	403	514	491	613	718	902	1125	1389	1547	1824	2444	3016	3528	3711	4073	4221	3152	551	93	20	3	2	3	1	0	0	
Lifestyle	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	
Music	0	0	0	0	0	0	0	0	0	0	0	1	0	0	1	8	1	5	1	0	23	36	70	54	63	1	3	2	35	6	5	12	1	1	2	1	12	28	12	8	6	28	44	108	309	86	65	56	59	64	98	119	133	127	113	68	111	120	98	104	108	155	123	130	166	124	101	109	113	113	109	117	151	152	175	134	138	146	138	171	228	201	214	200	172	219	167	142	152	139	151	180	182	206	210	221	232	245	278	301	313	306	323	290	318	359	373	381	366	336	412	386	461	523	547	714	678	818	1012	1228	1311	1333	1356	1473	1948	2537	2944	3246	3332	3088	920	35	7	0	1	0	0	0	0	0	
Musical	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	13	0	0	0	1	1	1	1	40	40	50	2	0	2	1	0	1	2	1	0	2	2	2	0	4	6	5	21	21	179	252	120	118	158	197	195	177	189	179	125	113	134	98	118	146	124	113	99	100	146	109	133	122	124	136	125	152	146	139	156	153	139	157	145	157	173	192	208	156	135	150	144	167	118	114	116	120	103	142	117	132	137	154	144	128	126	136	121	114	106	103	99	100	100	120	112	124	120	118	101	163	163	202	224	260	315	327	302	377	535	586	662	683	679	668	290	33	9	1	0	1	0	1	0	0	
Mystery	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	1	1	1	5	12	6	6	21	28	52	76	52	63	44	30	56	71	47	47	35	11	25	21	22	43	39	66	82	63	69	79	79	74	83	88	67	72	75	48	62	75	89	85	63	59	54	49	52	42	52	43	45	57	59	75	64	60	79	68	74	77	75	83	84	86	68	103	107	93	96	111	84	99	92	93	101	106	104	126	100	129	113	145	131	167	146	159	171	175	183	200	204	213	237	225	303	339	422	447	480	620	638	704	790	1373	2003	2173	2363	2647	2587	1264	174	42	7	2	3	0	1	0	0	
News	0	0	0	0	0	0	0	0	0	0	3	7	60	68	109	9	25	45	19	2	4	4	0	2	0	3	22	143	277	288	188	226	194	2	2	0	1	14	6	7	2	2	0	2	2	2	5	1	5	1	7	3	5	5	9	1	12	1	1	0	3	4	8	11	11	5	15	17	15	21	8	26	18	13	22	19	9	10	12	11	17	10	12	21	6	20	13	16	6	13	15	12	14	21	17	32	34	31	27	38	30	28	30	40	48	52	34	79	73	70	71	89	66	81	63	86	108	117	104	145	230	240	280	259	662	1027	1207	1252	1384	1329	312	3	0	0	0	0	0	0	0	0	
Reality-TV	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	2	0	1	1	2	0	1	2	3	2	2	6	4	4	3	5	1	5	4	2	2	4	5	6	3	6	9	3	8	12	14	8	7	18	14	12	21	21	30	14	26	42	40	36	32	39	54	51	39	74	74	134	210	217	328	473	660	787	931	934	931	1011	1202	1370	1367	1060	441	19	1	0	0	0	0	0	0	0	
Romance	0	0	0	0	0	0	0	1	0	1	0	1	1	1	2	0	3	0	8	2	3	4	10	26	38	147	226	391	325	241	204	142	104	125	138	85	118	146	97	116	149	250	185	174	179	238	225	222	233	272	318	271	339	315	237	218	228	209	164	165	142	166	170	169	208	173	201	212	207	231	276	246	253	283	283	278	277	281	258	271	322	325	369	359	387	361	318	313	238	301	255	237	242	267	279	326	294	305	327	309	373	316	355	371	333	329	363	417	411	419	502	502	591	620	648	767	875	984	1078	1142	1307	1328	1439	1696	2343	3153	3447	3737	4234	4267	1901	238	58	13	1	3	3	1	1	0	
Sci-Fi	0	0	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	1	0	1	1	1	1	4	2	3	4	3	1	4	5	8	1	6	2	8	5	4	2	7	6	1	2	2	4	4	6	9	14	8	19	13	9	10	10	15	9	15	8	13	5	5	5	7	15	13	25	17	36	36	28	54	59	66	59	45	43	51	60	50	77	109	97	67	75	67	68	79	95	84	81	63	102	128	171	175	180	171	202	176	217	239	252	236	239	215	250	237	255	275	288	339	320	342	383	362	413	472	487	568	616	672	790	938	1364	1785	1981	2305	2540	2677	1745	327	95	25	7	7	0	0	0	0	
Short	1	1	1	5	2	5	9	9	2	93	117	819	1323	1712	1777	1800	1741	1801	2655	1828	1665	1820	2450	4255	5082	5127	6128	7889	8782	7697	6521	4512	2968	1743	1490	1721	1484	1308	1059	1082	1222	1236	1260	1230	1473	998	888	825	771	880	810	879	956	923	769	670	827	776	678	680	667	664	661	737	743	708	753	749	785	757	836	818	855	802	774	818	908	1045	1061	1172	1190	1272	1391	1257	1287	1212	1260	1148	1113	1118	1198	1188	1055	1140	1122	1172	986	971	984	1096	1217	1249	1256	1340	1360	1518	1540	1714	1754	2114	2317	2681	2893	3566	4001	4626	5315	6370	7770	10103	12349	13317	14865	18191	23416	28773	33125	36171	38741	38256	14782	320	20	7	0	1	0	0	0	1	
Sport	0	0	0	0	0	0	1	4	0	10	4	19	63	54	156	54	93	62	88	52	68	81	64	110	73	30	12	27	15	12	19	16	9	4	18	17	34	47	19	50	47	44	49	34	54	29	76	76	39	31	40	50	52	60	39	31	44	41	26	21	21	41	42	53	61	40	59	56	41	43	44	52	51	23	27	25	27	30	25	42	44	42	34	61	56	82	68	84	106	82	91	132	186	132	104	91	89	86	126	159	138	143	142	155	168	169	181	189	163	220	180	232	200	217	259	277	322	358	367	498	642	836	844	858	932	1058	1072	1166	1218	1139	422	45	9	1	0	0	0	0	0	0	
Talk-Show	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	3	0	0	0	0	2	2	1	0	0	0	0	0	0	0	1	6	11	11	13	18	15	17	13	9	15	30	17	23	22	16	19	15	22	19	21	38	25	18	25	22	24	25	22	22	28	25	31	30	37	30	49	49	52	54	53	68	57	62	75	88	110	113	148	135	151	163	162	174	158	196	210	238	266	351	465	484	404	446	510	607	715	654	614	209	2	0	0	0	0	0	0	0	0	
Thriller	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	4	1	6	30	30	105	117	104	105	93	39	14	10	13	11	12	8	17	17	26	18	21	20	11	21	22	26	25	28	36	42	29	46	33	39	54	45	39	48	58	65	69	68	65	63	50	63	64	62	69	86	88	113	91	96	99	88	117	129	126	168	159	158	144	189	203	235	236	203	190	183	194	194	202	196	202	219	242	253	258	355	357	388	438	435	445	482	465	565	571	593	630	671	748	784	943	1004	1068	1267	1475	1640	1942	2827	3546	4023	4486	4976	5347	4103	793	122	24	2	4	0	0	0	0	
War	0	0	0	0	0	0	0	0	0	0	0	0	9	55	35	47	34	2	5	44	7	2	1	15	33	17	50	39	47	101	103	58	66	127	32	8	8	5	12	12	11	32	33	35	22	28	27	19	32	28	27	46	132	94	71	70	125	209	204	168	124	61	40	46	56	57	72	57	63	54	69	74	76	97	114	110	95	107	102	112	113	105	102	130	115	132	93	74	79	78	88	85	72	74	89	82	94	75	109	90	129	103	110	114	107	101	88	77	83	81	93	76	106	98	125	123	186	179	195	260	276	325	383	475	549	731	707	703	790	787	369	63	22	7	1	1	0	1	0	0	
Western	0	0	0	0	0	0	0	0	0	1	0	0	0	0	4	0	1	0	8	4	3	8	20	48	106	371	511	579	498	406	307	219	181	139	209	183	215	214	162	219	288	248	210	207	120	112	102	115	67	92	159	141	141	129	136	145	138	129	109	114	95	106	112	137	151	155	127	137	102	93	110	108	120	101	83	88	74	82	54	92	123	150	136	139	108	111	145	136	82	66	73	62	39	49	50	31	29	29	22	23	23	31	25	26	21	26	41	28	32	45	54	25	23	32	37	25	33	50	67	52	78	80	106	138	198	273	303	314	330	354	220	52	15	1	1	0	0	0	0	0	

Lineplot visualization

Note: The movie genres are not properly displayed as the legend on the line plot.


In [25]:
Image(filename='Movie Genre lineplot.png')


Out[25]:

According to the IMDB moive genres data, there was an increasing number of films that were produced after 1900 and peaked in 1913, with 8782 Shorts, 3232 Dramas, 2961 Comedies, 976 Documentaries and 498 Westerns as the 5 most popular movie genres that year. There was a sharp decline of movies produced between 1914 and 1917, likely becuase of the onset of World War I that disrupted the movie industry. Of the 33 movie genres, Short, Drama and Comedy remained the 3 most popular movie genres throughout most of the 20th century, except for 1990 to 2009, when Documentary and Adult films emerged as competing genres to Dramas and Comedies. The most number of films were produced in 2013, with 38741 Shorts, 22166 Dramas, 15114 Comedies, 12022 Documentaries and 4976 Thrillers made that year respectively.


In [ ]: